support cuda 13.0 and trtllm kernel #9495

rainj-me · 2025-08-22T06:26:59Z

Motivation

Support cuda130 with custom flashinfer and trtllm kernel Aug 25 2025
Support sm_110 and sm_121 on cuda 130
Support --compress-mode=size on cuda 130
Keep sm_101 support on cuda 128/129

Test

Step 1, use nvidia pytorch 25.08 image

docker pull nvcr.io/nvidia/pytorch:25.08-py3

Step 2, run container with bash
Step 3, git clone the change
Step 4, comment out the torch dependency in sgl-kernel/pyproject.toml, patch like

diff --git a/sgl-kernel/pyproject.toml b/sgl-kernel/pyproject.toml
index 52ee620e4..177e49e57 100644
--- a/sgl-kernel/pyproject.toml
+++ b/sgl-kernel/pyproject.toml
@@ -1,7 +1,7 @@
 [build-system]
 requires = [
   "scikit-build-core>=0.10",
-  "torch>=2.8.0",
+  # "torch>=2.8.0",
   "wheel",
 ]

Step 5, patch the python/pyproject.toml with the torch version from the container, in my container the patch is like

diff --git a/python/pyproject.toml b/python/pyproject.toml
index c23efbc2e..b29789d45 100644
--- a/python/pyproject.toml
+++ b/python/pyproject.toml
@@ -49,7 +49,7 @@ runtime_common = [
     "scipy",
     "timm==1.0.16",
     "tiktoken",
-    "torchao==0.9.0",
+    "torchao==0.12.0+git",
     "transformers==4.55.2",
     "uvicorn",
     "uvloop",
@@ -59,21 +59,19 @@ runtime_common = [
 srt = [
     "sglang[runtime_common]",
     "sgl-kernel==0.3.5",
-    "torch==2.8.0",
-    "torchaudio==2.8.0",
+    "torch==2.8.0a0+34c6371d24.nv25.8",
     "torchvision",
     "cuda-python",
-    "flashinfer_python==0.2.11.post3",
+    "flashinfer_python==0.2.14.post1",
 ]
 
 blackwell = [
     "sglang[runtime_common]",
     "sgl-kernel",
-    "torch==2.8.0",
-    "torchaudio==2.8.0",
+    "torch==2.8.0a0+34c6371d24.nv25.8",
     "torchvision",
     "cuda-python",
-    "flashinfer_python==0.2.11.post3",
+    "flashinfer_python==0.2.14.post1",
 ]

Step 6, install sgl-kernel

CUDA_VERSION=13.0 CMAKE_BUILD_PARALLEL_LEVEL="$(nproc)" SKBUILD_BUILD_DIR=./build CMAKE_ARGS="-DCMAKE_POLICY_VERSION_MINIMUM=3.5"  pip install -v .

Step 7, install sglang
Step 8, git clone flashinfer with commit 018b5518 and install with editable pkg

Modifications

use custom flashinfer to support cuda130 and load trtllm kernel Aug 21 2025
fix cub::Sum cub::Max issue and let it support both cuda12x and 130
use torch 2.8.x and cuda 130

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

rainj-me · 2025-08-26T17:51:52Z

sgl_kernel-0.3.6.post2-cp310-abi3-linux_x86_64_1.whl with --compress-mode=size flag, wheel size 322MB
sgl_kernel-0.3.6.post2-cp310-abi3-linux_x86_64.whl without flag, wheel size 379MB

johnnynunez · 2025-08-26T17:58:59Z

@zhyncs @rainj-me @faradawn LGTM

FlamingoPg

Overall LGTM, wait for CI

sgl-kernel/CMakeLists.txt

rainj-me · 2025-08-27T06:21:54Z

Followup: #9680

voipmonitor · 2025-08-27T09:39:18Z

@rainj-me how do you build the sgl-kernel please? simple "make build" ? I'm getting error

/workspace/sglang/sgl-kernel/build/_deps/repo-mscclpp-src/include/mscclpp/atomic_device.hpp:10:10: fatal error: cuda/atomic: No such file or directory

(using the same procedure you wrote)

zhyncs · 2025-08-27T18:20:38Z

This break the latest build on b200 cu128, I'll revert this first. @rainj-me

nvcc fatal   : Unsupported gpu architecture 'compute_103'

johnnynunez · 2025-08-27T18:33:54Z

This break the latest build on b200 cu128, I'll revert this first. @rainj-me
nvcc fatal   : Unsupported gpu architecture 'compute_103'

This must be in cuda 13, it is gb300

rainj-me · 2025-08-27T21:47:58Z

@rainj-me how do you build the sgl-kernel please? simple "make build" ? I'm getting error

/workspace/sglang/sgl-kernel/build/_deps/repo-mscclpp-src/include/mscclpp/atomic_device.hpp:10:10: fatal error: cuda/atomic: No such file or directory

(using the same procedure you wrote)

try to use the following command to build

CUDA_VERSION=13.0 CMAKE_BUILD_PARALLEL_LEVEL="$(nproc)" SKBUILD_BUILD_DIR=./build CMAKE_ARGS="-DCMAKE_POLICY_VERSION_MINIMUM=3.5"  pip install -v .

johnnynunez · 2025-08-27T22:39:55Z

@rainj-me how do you build the sgl-kernel please? simple "make build" ? I'm getting error

/workspace/sglang/sgl-kernel/build/_deps/repo-mscclpp-src/include/mscclpp/atomic_device.hpp:10:10: fatal error: cuda/atomic: No such file or directory

(using the same procedure you wrote)

this errors occurs because you are not pointing well to the correct path:
for example for cuda 13 in sbsa:

export CPLUS_INCLUDE_PATH=/usr/local/cuda-13.0/targets/sbsa-linux/include/cccl

that resolves the problem for me in gh200.

https://developer.nvidia.com/blog/whats-new-and-important-in-cuda-toolkit-13-0/

voipmonitor · 2025-08-28T11:43:16Z

@rainj-me how do you build the sgl-kernel please? simple "make build" ? I'm getting error
/workspace/sglang/sgl-kernel/build/_deps/repo-mscclpp-src/include/mscclpp/atomic_device.hpp:10:10: fatal error: cuda/atomic: No such file or directory
(using the same procedure you wrote)

try to use the following command to build
CUDA_VERSION=13.0 CMAKE_BUILD_PARALLEL_LEVEL="$(nproc)" SKBUILD_BUILD_DIR=./build CMAKE_ARGS="-DCMAKE_POLICY_VERSION_MINIMUM=3.5"  pip install -v .

thank you - is it worth to try trt llm kernels if I have sm120 architecture (which is RTX PRO 6000 ) FP8 blockwise mostly or compressed FP8 scale

rainj-me mentioned this pull request Aug 22, 2025

[Feature] CUDA130 and trtllm-gen (Aug-25-2025) support #9490

Closed

2 tasks

support cuda 13.0 and cuda 12.8

e85f927

rainj-me force-pushed the dev/support_cuda130 branch from 0621958 to e85f927 Compare August 26, 2025 02:55

rainj-me marked this pull request as ready for review August 26, 2025 02:56

rainj-me requested review from BBuf, FlamingoPg, HaiShaw, HandH1998, ispobock, merrymercy, yizhang2077 and zhyncs as code owners August 26, 2025 02:56

rainj-me changed the title ~~support cuda 13.0 and trtllm kernel by Aug 21 2025~~ support cuda 13.0 and trtllm kernel by Aug 25 2025 Aug 26, 2025

Merge branch 'main' into dev/support_cuda130

f8d9eef

FlamingoPg self-assigned this Aug 26, 2025

rainj-me and others added 3 commits August 26, 2025 10:44

update the nvcc flags and the arch 110 121 support on cuda 130

fc62038

Merge branch 'main' into dev/support_cuda130

453645c

Merge branch 'main' into dev/support_cuda130

6a952fc

johnnynunez mentioned this pull request Aug 26, 2025

[NVIDIA] Add Jetson Orin/Thor/Spark/GB300 Codegen #7337

Closed

FlamingoPg approved these changes Aug 26, 2025

View reviewed changes

sgl-kernel/CMakeLists.txt Show resolved Hide resolved

rainj-me and others added 2 commits August 26, 2025 23:02

rollback deep_gemm and add B300 suppor

b59d071

Merge branch 'main' into dev/support_cuda130

98af9a1

rainj-me merged commit 79e6a8a into sgl-project:main Aug 27, 2025
21 of 54 checks passed

rainj-me mentioned this pull request Aug 27, 2025

[Bug] DeepGemm jit issue with Cuda-13.0 #9680

Closed

5 tasks

rainj-me changed the title ~~support cuda 13.0 and trtllm kernel by Aug 25 2025~~ support cuda 13.0 and trtllm kernel Aug 27, 2025

rainj-me mentioned this pull request Aug 27, 2025

Support compile sgl-kernel on cuda 13.0 #9721

Merged

4 tasks

MahmoudAshraf97 pushed a commit to MahmoudAshraf97/sglang that referenced this pull request Sep 8, 2025

support cuda 13.0 and trtllm kernel by Aug 25 2025 (sgl-project#9495)

68f6fea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support cuda 13.0 and trtllm kernel #9495

support cuda 13.0 and trtllm kernel #9495

Uh oh!

rainj-me commented Aug 22, 2025 •

edited

Loading

Uh oh!

rainj-me commented Aug 26, 2025

Uh oh!

johnnynunez commented Aug 26, 2025 •

edited

Loading

Uh oh!

FlamingoPg left a comment

Uh oh!

Uh oh!

Uh oh!

rainj-me commented Aug 27, 2025

Uh oh!

voipmonitor commented Aug 27, 2025

Uh oh!

zhyncs commented Aug 27, 2025

Uh oh!

johnnynunez commented Aug 27, 2025 •

edited

Loading

Uh oh!

rainj-me commented Aug 27, 2025

Uh oh!

johnnynunez commented Aug 27, 2025 •

edited

Loading

Uh oh!

voipmonitor commented Aug 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

support cuda 13.0 and trtllm kernel #9495

support cuda 13.0 and trtllm kernel #9495

Uh oh!

Conversation

rainj-me commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Test

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

rainj-me commented Aug 26, 2025

Uh oh!

johnnynunez commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FlamingoPg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rainj-me commented Aug 27, 2025

Uh oh!

voipmonitor commented Aug 27, 2025

Uh oh!

zhyncs commented Aug 27, 2025

Uh oh!

johnnynunez commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rainj-me commented Aug 27, 2025

Uh oh!

johnnynunez commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

voipmonitor commented Aug 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

rainj-me commented Aug 22, 2025 •

edited

Loading

johnnynunez commented Aug 26, 2025 •

edited

Loading

johnnynunez commented Aug 27, 2025 •

edited

Loading

johnnynunez commented Aug 27, 2025 •

edited

Loading